Cells to Civilizations by Coen Enrico

Cells to Civilizations by Coen Enrico

Author:Coen, Enrico
Language: eng
Format: epub
Publisher: Princeton University Press
Published: 2012-06-13T16:00:00+00:00


Staying on the Move

One of the key features of TD-learning is that it does not eliminate discrepancies; it shifts them. For the monkey experiment, the overall result is that the door sound, instead of touching the apple, comes to trigger the discrepancy neuron, so the discrepancy is shifted to an earlier time. Whereas the monkey previously got excited when touching the apple, it now gets excited at the sound of the door. According to the proponents of TD-learning, this is because one consequence of the discrepancy neuron firing is to excite or give pleasure the monkey. Indeed, the neurotransmitter dopamine released by the discrepancy neuron (as studied by Romo and Schultz) is thought to play a major role in drug addiction. Drugs such as cocaine and amphetamine are thought to exert their effects by enhancing dopamine action. Initially, touching the apple triggers the discrepancy neuron and dopamine release, and thus excites the monkey. After learning, the door sound is behaving like the apple was at the beginning; it triggers the discrepancy neuron and dopamine release. From this neural perspective, the door sound has become a substitute reward—it triggers the discrepancy neuron and dopamine release just as the apple did. If a new stimulus, like a light flash, occurs before the door sound, then the monkey would treat this situation just as if the light flash was a predictor of reward. But in this case the “reward” is not touching the apple but substitute reward of the door sound. If the monkey has many experiences of the light flash preceding the door sound, TD-learning ensures that the discrepancy neuron starts to fire even earlier, at the time of the light flash. We are building one expectation on another. Having learned to expect the apple reward based on the door sound, the neural system automatically learns to respond to factors that might allow it to predict the door sound. The drive for learning does not stop; it has instead shifted to another stimulus.

The ability of learning to create substitute rewards in this way is so prevalent that it can be hard to identify what constitutes an instinctive versus a learned reward. In many of Pavlov’s experiments, the smell or sight of meat was used to condition other responses. You might imagine that dogs have an inborn salivation response to the presence of meat and learn signals, such as the bell ring, that act as predictors of this reward. But the response to meat turns out not to be inborn; it is a result of conditioning. If a dog is fed for a long time on milk alone after birth, then it does not salivate at its first smell or sight of meat. The smell or sight of meat start to evoke the salivary reaction only after the dog has been fed meat a few times. The odor and appearance of meat then act as predictors of rewards like a satisfying meal, and start to become rewards in their own right. What we



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.